-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic contempt #1394
Dynamic contempt #1394
Conversation
Let me open a PR. There has been some discussion going on here |
The reprosearch testing is failing (locally too), and I do not understand why. Any ideas? |
Ah, I found out. I have to initialize the contempt. |
Interesting idea. Maybe next step is to remove static contempt and use only dynamic contempt, this [-3, 1], if succeeds would get rid of combo boxes and other fancy stuff in one go, simplifying a lot UI. |
Thx @mcostalba. I think your suggestion is smart indeed. http://tests.stockfishchess.org/tests/view/5a782de40ebc5902971a98cf |
If we use this formula:
then it is a simplification compared to current master (it removes the UCI Contempt option) If we use
then is is a complexification compared to current master. Users will want to automatically set the UCI Contempt to zero, etc, and the combo box will still be there :-) |
Are you sure? I think this implementation does not suffer from the asymmetry of the current one since it is centered on zero and covaries with best value. But maybe I am wrong. In which case you would not need the combo box but just the old contempt value. |
I will say again that this patch lets different threads modify the same global Eval::Contempt variable. That should be cause for concern. |
The best would be removing contempt uci option altogether. Maybe contempt
is just a special case of a more general formula:
C = c0 + c1*bestValue
Where current contempt is c0
…On Monday, February 5, 2018, syzygy1 ***@***.***> wrote:
I will say again that this patch lets different threads modify the same
global Eval::Contempt variable. That should be cause for concern.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1394 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABDGAW1YY9rflJ1cg6sTIu2b42ZWLBzXks5tRuBhgaJpZM4R4-2S>
.
|
Maybe some tuning session would help.
I foresee 2 sessions. One standard in self play, another against a fixed
lower level opponent like sf8 of course only master would have tuning code
enabled in the latter case.
…On Monday, February 5, 2018, Marco Costalba ***@***.***> wrote:
The best would be removing contempt uci option altogether. Maybe contempt
is just a special case of a more general formula:
C = c0 + c1*bestValue
Where current contempt is c0
On Monday, February 5, 2018, syzygy1 ***@***.***> wrote:
> I will say again that this patch lets different threads modify the same
> global Eval::Contempt variable. That should be cause for concern.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1394 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABDGAW1YY9rflJ1cg6sTIu2b42ZWLBzXks5tRuBhgaJpZM4R4-2S>
> .
>
|
I already asked this elsewhere. If contempt depends on bestValue doesn't that create a positive feedback loop (as bestValue again depends on contempt)? It is not meant as a criticism. I just want to make sure I understand things correctly. |
Test will tell, I see no other way to know.
…On Monday, February 5, 2018, vdbergh ***@***.***> wrote:
I already asked this elsewhere.
If contempt depends on bestValue doesn't that create a positive feedback
loop (as bestValue again depends on contempt)? It is not meant as a
criticism. I just want to make sure I understand things correctly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1394 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABDGAaSrfCEa86b3M3yCiWkMfOTgKZDAks5tRueGgaJpZM4R4-2S>
.
|
@mcostalba If you have nothing to say then please keep quiet.
What kind of an answer is that?? How can a test show if there is a positive feedback loop or not? This is a property of the algorithm. |
Yes, there is some feedback. I had wanted to say that contempt increases by 10% per iteration, but that is probably not true. The base contempt setting is constant. In a neutral position, about 10% is added in the first iteration. In the second iteration another 10% of that 10% is added, so 11% total. Then 11.1% etc. So the amount of feedback seems to stay within limits. Unless I am overlooking something. |
I think @syzygy1 got it right. The dynamic contempt contribution is a fraction of the best value depending on the search iteration in the way Ronald described it. This it the geometric series over and over again. |
@syzygy1 @Stefano80 Thanks. Now I did the excercise myself (I should have done it before but was busy). Assume u is the true value of the position and c is the static contempt. Let b be bestValue and let cc be the dynamic content. In first approximation the recursion is given by b <- cc+u, cc <- c+b/10. The fix point is given by b=(c+u)*1.111... (as Ronald was saying). So the expected effect of this patch would be to gradually increase the value of bestValue (if c+u>0), but not in an uncontrolled way. It is not clear to me if any conclusions can be drawn from this. Maybe there is some interaction with A-B search resulting in faster cutoffs? |
Maybe, maybe not. But be careful: the effect is to gradually increase the fraction of the best value which is added to contempt, not best value itself. |
@Stefano80 Since ultimately bestValue (in first approximation) is the sum to the true value and contempt, bestValue increases as well. |
Depends whether best value is positive or negative for the root colour, in one case it will increase, in the other decrease. |
@Stefano80 Yes this is what I wrote in my original post. Sorry that I did not repeat it in my reply. |
@Stefano80 there are two failures in your CI travis test. As @snicolet was pointing out, the first one is a trailing '.' in your bench, the second one is a race condition in your code (as @syzygy1 has pointed out). |
9ed5f64
to
cee1828
Compare
I fixed the commit message, rebased and squashed all together. The race is condition is most probably harmless, but the decision is yours in the end. If you want to merge but only after fixing the race condition, I will need some help. At the moments, the SMP test is inconclusive, although slightly positive, http://tests.stockfishchess.org/tests/view/5a7802290ebc5902971a98c4 What I do not understand: I fixed the bench and now a test that was previously successful is failing. |
As long as the SMP test is inconclusive, there is reason to think that letting all threads modify Eval::Contempt independently does not work very well. Perhaps it makes sense to let only the main thread adjust Eval::Contempt? To remove the data race, make Eval::Contempt an atomic variable and read and write to it with "relaxed" ordering. |
If contempt is only changed at the root at every iteration, would that (race condition) really be a problem, I mean could the be some slowdown or something because of the contempt variable being accessed by multiple threads? Don't know much about that but seems a bit improbable. And it would quickly become less of a problem, once the time between iterations increases? With more threads, the effect of re-searching that you get from contempt is expected to be less because all the threads already do this re-searchng... |
Make Eval::Contempt atomic, you don't need anything else. Even relaxed write is a useless overkill considering that this is far from fast path. Just define atomic and that's all. It is used also as self documenting that is a shared variable. |
Eval::Contempt is read in each call to evaluate(). |
@Kingdefender A more serious question is what this all means for the effect of the patch on SMP. |
@Stefano80 The idea is to have independent evidence of elo gain (science is based on repeatability of experiments). If the test is inconclusive then there is no independent evidence of elo gain. But I wonder why you reject a serious test ?? I agree 100000 games is a lot, but given the novelty of the patch I think it is warranted. SPRT(-3,1) tests routinely run for 100000 games and nobody complains about this. |
@vdbergh I don't understand what you are actually arguing about. I am just testing to check whether this capping mechanism is actually as safe as I think it is. We already have independent evidence of Elo gain from a lot of different fishtest contributors. This is the reason why we have a distributed test framework with SPRT tests. You seem to be extremely convinced that dynamic contempt is no good. I will not reply to you anymore until you adopt a more realistic position. |
Sorry, I do not know where you get that from. There is only one single test that has shown an elo gain for contempt in self play with a statistically significant result and that is yours. Given that so many tests have been run, and given the unknown principle by which the contempt mechanism works (in self play), asking for an independent confirmation to remove all doubts that your test was not a statistical fluke (which of course happens) is just standard scientific practice. Moreover such an independent confirmation can be trivially achieved. But as I said. I am just recording my personal opinion so that I can refer to it later. |
Also, whereas Stefano's test suggests that convincingly "static contempt"+"dynamic adjustment" > "static contempt" if "static contempt"=20 the recent failed test http://tests.stockfishchess.org/tests/view/5a7986450ebc5902971a9979 does not confirm this for "static contempt"=0. This is at least somewhat puzzling as generally an idea - in this case the dynamic adjustment - should be able to stand on its own feet, especially if it would give such a convincing elo gain as suggested by Stefano's test. |
This is helping the contempt, but only if contempt exists. We are not going to doubt any (0,5) passed test on principle, as @mcostalba has said many times we are gladly accepting flukes because they happen 1 time out of 20. But this is clearly not something that can replace base contempt as hoped. I think that because contempt manipulation is a new thing, many implementations will come to replace the initial ones and by accepting this it does not mean that it will stay forever. We will have in mind that in future unified contempt tries this code can be removed or altered. Moving on is better than debating. Just accept and thus enable outside testing of the world and then proceed accordingly with the reports. @vdbergh Contempt is a new concept which is puzzling for many, nothing to worry about. Things will get clear in due time. |
Not doubting SPRT tests is good practice for patches that follow the standard pattern of improving eval and tweaking search. The working of such patches is clearly understood - both from a theoretical and a practical point of view - and an occasional false positive is totally harmless. IMHO one should be more strict about patches whose working is not understood. Before expending a lot of energy on them one should make absolutely sure that they really do gain elo. Scientists say: extraordinary claims require extraordinary proof. Recall that the dynamic contempt attempt from 2014 also first passed STC and LTC testing but then ultimately turned out not to work. |
Such an initialization of an object of type |
Thanks @atumanian for pointing to the right resource. |
@Stefano80, you are welcome. C++17 may have changed this. In the new version of C++ this initialization may work due to copy elision. But I haven't tried this. |
By coincidence, I just watched a CPPCON youtube video about atomic that showed some surprising things that you cannot do. Initialization syntax was one of them. Maybe it will help. https://www.youtube.com/watch?v=ZQFzMfHIxng lee |
Thx to all, now it looks like the CI tests work, although two of them are taking very long. I scheduled a LTC multithreading test. |
@Stefano80 Some coding style notes:
contempt = bestValue > 500? 50:
bestValue < -500? -50:
bestValue / 10;
contempt += Options["Contempt"] * PawnValueEg / 100; // From centipawns |
...or even better, the opposite: contempt = Options["Contempt"] * PawnValueEg / 100; // From centipawns
contempt += bestValue > 500? 50: // Dynamic contempt
bestValue < -500? -50:
bestValue / 10; |
..finally, I would express the bestvalue limit in terms of PawnValueEg , something like: contempt += bestValue > 2 * PawnValueEg ? 50: // Dynamic contempt
bestValue < -2 * PawnValueEg ? -50:
bestValue / 10; |
Thx Marco, will do soon. |
Hi @snicolet, all subsequent tests after the initial SPRT are at around +1 Elo. Are you going to commit? If not I will save the time and close the PR before implementing Marco's suggestions. |
@Stefano80 My current inclination is to commit this idea, it looks promising. But there is no need to rush, I want that we take the time to get it right. We can wait for the tests to finish. You can implement Marco's style suggestions on top of this dynamicContempt branch, they are good ones and we shall keep this branch as our working branch for now. Concerning the multithreaded Contempt, it is good that we have managed to make it atomic and change it in search(). Isn't there an uninitialized variable problem however, if we use a code path where we call evaluate(pos) outside of search(). For instance, if I type the following debugging sequence in a terminal: ./stockfish The first eval will call So some details to get right, but again, on the grand scale, I am very pleased that the idea works, you imagine :-) |
Will dynamic contempt backfire in various imbalance position which usually favour Komodo? Assuming SF vs K. |
The bug for the 'eval' debugging command is fixed in master in this commit: 211ebc5 |
Yes, I noticed the problem, the eval contempt is initialized with 0 it seems. The problem with the value not being initialized with the default contempt is already present in master, right? |
0eb9ea3
to
6334237
Compare
So, I implemented the changes suggested by @mcostalba (with one small difference), squashed, rebased and updated the bench. |
Why mixing PawnValueEg and PawnValueMg? This could raise some 'hmmm' in people reading the code. |
…by 10%. Include suggestions from Marco Costalba, Aram Tumanian, Ronald de Man, Stephane Nicolet. Bench: 5791090
6334237
to
2e4f1b3
Compare
Oops, no reasons. I just am so used to code referencing the MG value that I did not notice that we have PawnValueEg here. Fixed. |
Merged via this commit : cb13243 . Time for a code freeze during a couple of days so that people can prepare optimized version for TCEC is they feel like :-) I have merged the version of "Dynamical Contempt" which was currently running with good results at LTC and in multithread mode (so using the [-50..50] cap instead of Marco's [-48..48] resulting from the cleaner code with PawnValueEg), because it was safer to use a tested version. During the code freeze we can test the interval [-2 * PawnValueEg .. 2 * PawnValueEg] for the cap instead of [-500..500] as a SPRT(-3..1), and I am sure that it will pass easily. Bravo à tous :-) |
Adjust contempt based on current score.
STC
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 110052 W: 24614 L: 23938 D: 61500
LTC
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 16470 W: 2896 L: 2705 D: 10869